AITopics

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.92)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Information Processing SystemsJun-19-2026, 20:08:16 GMT

MergeBench: ABenchmark for Merging Domain-Specialized LLMs

Model merging provides a scalable alternative to multi-task training by combining specialized finetuned models through parameter arithmetic, enabling efficient deployment without the need for joint training or access to all task data. While recent methods have shown promise, existing evaluations are limited in both model scale and task diversity, leaving open questions about their applicability to large, domain-specialized LLMs. To tackle the challenges, we introduce MergeBench, a comprehensive evaluation suite designed to assess model merging at scale. MergeBench builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales, and covers five key domains: instruction following, mathematics, multilingual understanding, coding and safety. We standardize finetuning and evaluation protocols, and assess eight representative merging methods across multi-task performance, forgetting and runtime efficiency. Based on extensive experiments, we provide practical guidelines for algorithm selection and share insights showing that model merging tends to perform better on stronger base models, with techniques such as merging coefficient tuning and sparsification improving knowledge retention. However, several challenges remain, including the computational cost on large models, the gap for in-domain performance compared to multi-task models, and the underexplored role of model merging in standard LLM training pipelines. We hope MergeBench provides a foundation for future research to advance the understanding and practical application of model merging. Our project page is at https://yifei-he.github.io/mergebench/.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsApr-24-2026, 16:10:40 GMT

11fc8c98b46d4cbdfe8157267228f7d7-Supplemental-Conference.pdf

artificial intelligence, conditional moe, machine learning, (16 more...)

Industry: Information Technology (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 06:00:46 GMT

Learning to Specialize with Knowledge Distillation for Visual Question Answering

Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han

Neural Information Processing Systems http://nips.cc/

accuracy, dataset, specialized model, (13 more...)

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsFeb-7-2026, 13:05:57 GMT

11fc8c98b46d4cbdfe8157267228f7d7-Supplemental-Conference.pdf

We follow most of the settings in Uni-Perceiver [93]: cross-entropy loss with label smoothing of 0.1 is adopted for all tasks, and the negative samples for retrieval tasks are only from the local batch in the current GPU. We also apply the same data augmentation techniques as Uni-Perceiver [93] to image and video modalities to avoid overfitting. There are some setting changes to improve the training stability of the original Uni-Perceiver. Following [102], a uniform drop rate for stochastic depth is used across all encoder layers and are adapted according to the model size. Additionally, LayerScale [101] is used to facilitate the convergence of Transformer training, and the same initialization of10 3 is set to all models for simplicity.

artificial intelligence, cc3m, machine learning, (17 more...)

Country: North America > Canada > Ontario > Toronto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

arXiv.org Artificial IntelligenceDec-8-2025

Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis

Li, Xinyi, Wang, Sai, Lin, Yutian, Wu, Yu, Yang, Yi

Retrosynthesis prediction aims to infer the reactant molecule based on a given product molecule, which is a fundamental task in chemical synthesis. However, existing models rely on static pattern-matching paradigm, which limits their ability to perform effective logic decision-making, leading to black-box decision-making. Building on this, we propose Retro-Expert, an interpretable retrosyn-thesis framework that performs collaborative reasoning by combining the complementary reasoning strengths of Large Language Models and specialized models via reinforcement learning. It outputs natural language explanations grounded in chemical logic through three components: (1) specialized models analyze the product to construct high-quality chemical decision space, (2) LLM-driven critical reasoning to generate predictions and corresponding interpretable reasoning path, and (3) reinforcement learning optimizing interpretable decision policy. Experiments show that Retro-Expert not only surpasses both LLM-based and specialized models across different metrics but also provides expert-aligned explanations that bridge the gap between AI predictions and actionable chemical insights.

large language model, machine learning, reasoning process, (17 more...)

2508.10967

Genre: Research Report (0.64)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Neural Information Processing SystemsNov-20-2025, 14:23:58 GMT

Learning to Specialize with Knowledge Distillation for Visual Question Answering

Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han

The proposed framework is model-agnostic and applicable to any tasks other than VQA, e .

artificial intelligence, machine learning, specialized model, (15 more...)

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceNov-3-2025

Mano Technical Report

Fu, Tianyu, Su, Anyang, Zhao, Chenxu, Wang, Hanning, Wu, Minghui, Yu, Zhe, Hu, Fei, Shi, Mingjia, Dong, Wei, Wang, Jiayao, Chen, Yuyang, Yu, Ruiyang, Peng, Siran, Li, Menglin, Huang, Nan, Wei, Haitian, Yu, Jiawei, Xin, Yi, Zhao, Xilin, Gu, Kai, Jiang, Ping, Zhou, Sifan, Wang, Shuo

Graphical user interfaces (GUIs) are the primary medium for human-computer interaction, yet automating GUI interactions remains challenging due to the complexity of visual elements, dynamic environments, and the need for multi-step reasoning. Existing methods based on vision-language models (VLMs) often suffer from limited resolution, domain mismatch, and insufficient sequential decisionmaking capability. To address these issues, we propose Mano, a robust GUI agent built upon a multi-modal foundation model pre-trained on extensive web and computer system data. Our approach integrates a novel simulated environment for high-fidelity data generation, a three-stage training pipeline (supervised fine-tuning, offline reinforcement learning, and online reinforcement learning), and a verification module for error recovery. Mano demonstrates state-of-the-art performance on multiple GUI benchmarks, including Mind2Web and OSWorld, achieving significant improvements in success rate and operational accuracy. Our work provides new insights into the effective integration of reinforcement learning with VLMs for practical GUI agent deployment, highlighting the importance of domain-specific data, iterative training, and holistic reward design.

large language model, machine learning, reinforcement learning, (19 more...)

2509.17336

Country: Europe > Austria (0.28)

Genre:

Workflow (1.00)
Instructional Material (0.66)
Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(2 more...)

arXiv.org Artificial IntelligenceOct-21-2025

MergeBench: A Benchmark for Merging Domain-Specialized LLMs

He, Yifei, Zeng, Siqi, Hu, Yuzheng, Yang, Rui, Zhang, Tong, Zhao, Han

Model merging provides a scalable alternative to multi-task training by combining specialized finetuned models through parameter arithmetic, enabling efficient deployment without the need for joint training or access to all task data. While recent methods have shown promise, existing evaluations are limited in both model scale and task diversity, leaving open questions about their applicability to large, domain-specialized LLMs. To tackle the challenges, we introduce MergeBench, a comprehensive evaluation suite designed to assess model merging at scale. MergeBench builds on state-of-the-art open-source language models, including Llama and Gemma families at 2B to 9B scales, and covers five key domains: instruction following, mathematics, multilingual understanding, coding and safety. We standardize finetuning and evaluation protocols, and assess eight representative merging methods across multi-task performance, forgetting and runtime efficiency. Based on extensive experiments, we provide practical guidelines for algorithm selection and share insights showing that model merging tends to perform better on stronger base models, with techniques such as merging coefficient tuning and sparsification improving knowledge retention. However, several challenges remain, including the computational cost on large models, the gap for in-domain performance compared to multi-task models, and the underexplored role of model merging in standard LLM training pipelines. We hope MergeBench provides a foundation for future research to advance the understanding and practical application of model merging. Our project page is at \href{https://yifei-he.github.io/mergebench/}{https://yifei-he.github.io/mergebench/}.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2505.10833

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceOct-17-2025

Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation

Peng, Xujun, Kumar, Anoop, Wu, Jingyu, Glenn, Parker, Liu, Daben

Retrieval-Augmented Generation (RAG) systems leverage Large Language Models (LLMs) to generate accurate and reliable responses that are grounded in retrieved context. However, LLMs often generate inconsistent outputs for semantically equivalent inputs, a problem compounded by the scarcity of consistency-focused training data and the limitations of current fine-tuning techniques in enhancing output consistency. We propose a new approach combining systematic synthetic data generation, triplet loss for better embeddings, and a novel layer-wise model merging approach. Using consistency-aware weights derived from intermediate layer activations, our method effectively integrates knowledge from specialized models. Experimental results how that our merged model significantly enhances output consistency, achieving a ~47.5\% improvement in response similarity over the baseline, thus offering a practical solution for increasing the reliability of an industrial RAG system.

large language model, machine learning, natural language, (20 more...)